PRITHIVSAKTHIUR's Repositories

100 repositories

1000-General-Knowledge-Flashcards
1000 Flashcards ( General, Sports, Technical,Space ) πŸ“”πŸ“”
⭐ 6 🌐 Public
3D-Printed-Or-Not-SigLIP2
3D-Printed-Or-Not-SigLIP2 is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for binary image classification. It is trained to distinguish between images of 3D printed and non-3D printed objects using the SiglipForImageClassification architecture.
⭐ 1 🌐 Public
Age-Classification-SigLIP2
Age-Classification-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to predict the age group of a person from an image using the SiglipForImageClassification architecture.
⭐ 3 🌐 Public
Agent-Dino
Dino: The Minimalist Multipurpose Chat System
⭐ 2 🌐 Public
AI-Art-Generator-SDXL
AUTOMATIC1111: Software for tensor operations, saving tensor data in .safetensors format. ComfyUI: UI library, possibly managing tensor data safely with *.safetensors. InvokeAI: ML platform using *.safetensors for secure tensor storage.
⭐ 7 🌐 Public
AIorNot-SigLIP2
AIorNot-SigLIP2 is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for binary image classification. It is trained to detect whether an image is generated by AI or is a real photograph using the SiglipForImageClassification architecture.
⭐ 3 🌐 Public
Airbnb-NYC-Maps
Airbnb Price in NYC ( Select Boroughs )
⭐ 6 🌐 Public
All-In-One-Downloader
yt-dlp is a feature-rich command-line audio/video downloader with support for thousands of sites. The project is a fork of youtube-dl based on the now inactive youtube-dlc.
⭐ 6 🌐 Public
Alphabet-Sign-Language-Detection
Alphabet-Sign-Language-Detection is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images into sign language alphabet categories using the SiglipForImageClassification architecture.
⭐ 1 🌐 Public
Anime-Classification-v0.1
Anime-Classification-v1.0 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify anime-related images using the SiglipForImageClassification architecture.
⭐ 2 🌐 Public
Augmented-Waste-Classifier-SigLIP2
Augmented-Waste-Classifier-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224
⭐ 3 🌐 Public
Auto-Abliteration
modify a language model's behavior by abliterating its weights.
⭐ 4 🌐 Public
Aya-Vision-Ocr-vs-Qwen2VL-Ocr
Messy Handwriting OCR Comparison Between Aya-Vision-8B and Qwen2VL-OCR-2B
⭐ 3 🌐 Public
Banana-Zoom
Banana Zoom an advanced image enhancement web app that lets users select regions of an image for AI-powered upscaling and detail refinement. Using Google’s (nano banana)
⭐ 4 🌐 Public
Base64-to-Image-Encode
Convert Base64 to image online
⭐ 1 🌐 Public
bellatrix-tiny3-1b-webgpu
webgpu based llm chatbot, try on chrome browsers
⭐ 2 🌐 Public
BERT-UNCASED
BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labeling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts.
⭐ 6 🌐 Public
Bidirectional-and-Auto-Regressive-Transformer-CNN
BART’s primary task is used to generate clean semantically coherent text from corrupted text data but it can also be used for a variety of different NLP sub-tasks like language translation, question-answering tasks, text summarization, paraphrasing, etc.
⭐ 6 🌐 Public
Bird-Species-Classifier-526
Bird-Species-Classifier-526 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224
⭐ 1 🌐 Public
blog
Public repo for HF blog posts
⭐ 0 🌐 Public
Callisto-OCR
Vision and Language Processing . [ Latex OCR, Math Parsing, Text Analogy OCR ]
⭐ 1 🌐 Public
Canopus-Realism
Realistic Image Generation, Realistic trigger works properly, better for photorealistic trigger words, close-up shots, face diffusion, male, female characters.
⭐ 8 🌐 Public
Captioner-Pro-Demo
caption image
⭐ 1 🌐 Public
Chatbot-GPT
3-In-1-Chatbot - GPT
⭐ 5 🌐 Public
Client-Record-CURD-OPs-Exercise
Client Record Management - CURD OPs + Blazor Web Assembly with Standalone App
⭐ 3 🌐 Public
Clipart-126-DomainNet
Clipart-126-DomainNet is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify clipart images into 126 domain categories using the SiglipForImageClassification architecture
⭐ 1 🌐 Public
Codepy-Deepthink-3B
step-by-step solutions, creative content, and logical analyses
⭐ 1 🌐 Public
Common-Voice-Gender-Detection
Speech-Emotion-Classification is a fine-tuned version of facebook/wav2vec2-base-960h for multi-class audio classification, specifically trained to detect emotions in speech. This model utilizes the Wav2Vec2ForSequenceClassification architecture to accurately classify speaker emotions from audio signals.
⭐ 1 🌐 Public
Convert-to-Onnx-Hf-Dir
Convert a Hugging Face model to ONNX & Upload Directly to Your Hf Model Repo
⭐ 1 🌐 Public
Coral-Health
Coral-Health is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify coral reef images into two health conditions using the SiglipForImageClassification architecture.
⭐ 2 🌐 Public
Core-OCR
A specialized optical character recognition (OCR) application built on advanced vision-language models, designed for document-level OCR, long-context understanding, and mathematical LaTeX formatting. Supports both image and video processing with multiple state-of-the-art model
⭐ 3 🌐 Public
Cosmos-x-DocScope
Understand physical common sense and generate appropriate embodied decisions. optimized for document-level optical character recognition, long-context vision-language understanding. build with hand-curated dataset for text-to-image models, providing significantly more detailed descriptions or captions of given images.
⭐ 1 🌐 Public
CUA-GUI-Operator
A Gradio-based demonstration for Computer Use Agent (CUA) tasks, supporting multiple vision-language models: Microsoft Fara-7B, ByteDance UI-TARS-1.5-7B, Hcompany Holo2-4B, and Uniphore ActIO-UI-7B. Users upload UI screenshots (e.g., desktop or app interfaces).
⭐ 1 🌐 Public
DATA-BOARD
Data Boards - Visualization of various plots ( Analysis )
⭐ 2 🌐 Public
deepfake-detector-model-v1
deepfake-detector-model-v1 is a vision-language encoder model fine-tuned from siglip2-base-patch16-512 for binary deepfake image classification. It is trained to detect whether an image is real or generated using synthetic media techniques. The model uses the SiglipForImageClassification architecture.
⭐ 16 🌐 Public
Deepfake-Quality-Detection
Good and Bad Quality Deepfake Detection
⭐ 2 🌐 Public
Deepfake-vs-Real-8000
Deepfake vs Real is a dataset designed for image classification, distinguishing between deepfake and real images.
⭐ 1 🌐 Public
DeepSeek-OCR-experimental
A Gradio-powered web interface for performing advanced OCR tasks using the DeepSeek-OCR model. This experimental app leverages Hugging Face Transformers to process images for text extraction, document conversion, figure parsing, and object localization.
⭐ 2 🌐 Public
Doc-VLMs-exp
An experimental document-focused Vision-Language Model application that provides advanced document analysis, text extraction, and multimodal understanding capabilities. This application features a streamlined Gradio interface for processing both images and videos using state-of-the-art vision-language models specialized in document understanding.
⭐ 3 🌐 Public
Doc-VLMs-v2-Localization
Doc-VLMs-v2-Localization is a demo app for the Camel-Doc-OCR-062825 model, fine-tuned from Qwen2.5-VL-7B-Instruct for advanced document retrieval, extraction, and analysis. It enhances document understanding and also integrates other notable Hugging Face models.
⭐ 3 🌐 Public
DocScope-R1
A powerful multi-modal AI application that combines three state-of-the-art vision-language models for comprehensive image and video analysis. DocScope-R1 provides OCR capabilities, detailed scene understanding, and video content analysis through an intuitive Gradio interface.
⭐ 2 🌐 Public
Document-Type-Detection
Document-Type-Detection is a multi-class image classification model based on google/siglip2-base-patch16-224, trained to detect and classify types of documents from scanned or photographed images. This model is helpful for automated document sorting, OCR pipelines, and digital archiving systems.
⭐ 1 🌐 Public
dots.ocr-fix-demo
This Gradio application demonstrates the capabilities of the "dots.ocr" model, a powerful multilingual document parser.
⭐ 2 🌐 Public
DREX
drex-062225-exp (document retrieval and extraction expert) model is a specialized fine-tuned version of docscopeocr-7b-050425-exp, optimized for document retrieval, content extraction, and analysis recognition. built on top of the qwen2.5-vl architecture.
⭐ 1 🌐 Public
EHRM-Demo
No description
⭐ 6 🌐 Public
EHRM-Models
EHRM [ Electronic Health Record Management ] introduces a centralized platform for analyzing patient records, offering insights into billing amounts, demographics, prevalent diagnoses, medical conditions, consulted doctors, admission types, and medication usage.
⭐ 10 🌐 Public
EHRM-Website
The primary issue is the fragmented nature of patient records within traditional healthcare systems. These records are stored in disparate formats across various departments or facilities, which hinders comprehensive analysis and decision-making. Additionally, medical data is voluminous and heterogeneous.
⭐ 8 🌐 Public
Face-Mask-Detection
Face-Mask-Detection is a binary image classification model based on google/siglip2-base-patch16-224, trained to detect whether a person is wearing a face mask or not. This model can be used in public health monitoring, access control systems, and workplace compliance enforcement.
⭐ 1 🌐 Public
Face-Swap-Roop
Face-Swapper | Gradio Work Space | .hf.space
⭐ 10 🌐 Public
facial-age-detection
facial-age-detection is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-512 for multi-class image classification. It is trained to detect and classify human faces into age groups ranging from early childhood to elderly adults. The model uses the SiglipForImageClassification architecture.
⭐ 1 🌐 Public
Facial-Emotion-Detection-SigLIP2
Facial-Emotion-Detection-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224
⭐ 6 🌐 Public
Fara-7B-GUI-Operator
A Gradio-based demonstration for the Microsoft Fara-7B model, designed as a computer use agent. Users upload UI screenshots (e.g., desktop or app interfaces), provide task instructions (e.g., "Click on the search bar"), and receive parsed actions with visualized indicators overlaid on the image.
⭐ 4 🌐 Public
Fashion-Mnist-SigLIP2
Fashion-Mnist-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images into Fashion-MNIST categories using the SiglipForImageClassification architecture.
⭐ 2 🌐 Public
Fashion-Product-Usage
Fashion-Product-Usage is a vision-language model fine-tuned from google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It classifies fashion product images based on their intended usage context.
⭐ 1 🌐 Public
FineTuning-MetaCLIP-2
This demonstrates the process of adapting a large scale pretrained model, MetaCLIP 2, for fine tuning a specific downstream task: image classification.
⭐ 2 🌐 Public
FineTuning-SigLIP-2
Fine-Tuning SigLIP 2 for Single/Multi-Label Image Classification. Image classification vision-language encoder model fine-tuned for Image Classification Tasks
⭐ 45 🌐 Public
Fire-Detection-Siglip2
Fire-Detection-Siglip2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to detect fire, smoke, or normal conditions using the SiglipForImageClassification architecture.
⭐ 1 🌐 Public
Flood-Image-Detection
Flood-Image-Detection is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-512 for binary image classification. It is trained to detect whether an image contains a flooded scene or non-flooded environment. The model uses the SiglipForImageClassification architecture.
⭐ 2 🌐 Public
Florence-2-Image-Caption
This application utilizes the powerful Florence-2 vision-language model from Microsoft to generate comprehensive captions for images. The model is capable of understanding visual content and expressing it in natural language.
⭐ 6 🌐 Public
Flux-API
Endpoint Image Generation using Flux
⭐ 2 🌐 Public
Flux-Image-Captioner
FLUX.1-dev with Qwen2VL Captioner and Prompt Enhancer
⭐ 4 🌐 Public
Flux-Krea-multi-GPU-Pool
A Python-based multi-GPU image generation pipeline using Huggingface Diffusers with LoRA (Low-Rank Adaptation) support. This project distributes image generation workloads across all available GPUs on the system leveraging Python multiprocessing to optimize throughput and speed.
⭐ 1 🌐 Public
Flux-LoRA-DLC
Experience the power of the FLUX.1-dev diffusion model combined with a massive collection of 255+ community-created LoRAs! This Gradio application provides an easy-to-use interface to explore diverse artistic styles directly on top of the FLUX base model.
⭐ 13 🌐 Public
FLUX-LoRA-DLC2
Experience the power of the FLUX.1-dev diffusion model combined with a massive collection of 100+ community-created LoRAs! This Gradio application provides an easy-to-use interface to explore diverse artistic styles directly on top of the FLUX base model.
⭐ 4 🌐 Public
FLUX-REALISM
A Gradio-based web application for generating hyper-realistic images using FLUX.1-dev with Super Realism LoRA enhancement. This application provides an intuitive interface for creating high-quality, photorealistic images with customizable parameters and styles.
⭐ 16 🌐 Public
Flux-Sketch-Smudge-3to1
3:1 Best Image Gen
⭐ 1 🌐 Public
FLUX.1-Comparator-Krea-Dev
A high-performance Gradio application for comparing and generating images using two powerful FLUX.1 diffusion models: FLUX.1-dev-merged and FLUX.1-krea-merged-dev. This application provides an intuitive interface for AI-powered image generation with advanced customization options.
⭐ 2 🌐 Public
Flux.1-dev-4bit
FLUX.1-dev model with 4-bit quantization, quantized model maintains image quality while significantly reducing GPU memory requirements, making it accessible for users with limited hardware resources.
⭐ 1 🌐 Public
Food-101-93M
Food-101-93M is a fine-tuned image classification model built on top of google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It is trained to classify food images into one of 101 popular dishes, derived from the Food-101 dataset.
⭐ 1 🌐 Public
Formula-Text-Detection
Formula-Text-Detection is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for binary image classification. It is built using the SiglipForImageClassification architecture to distinguish between mathematical formulas and natural text in document or image regions.
⭐ 2 🌐 Public
GALLO-3XL
High Quality Image Generation Model - Powered with NVIDIA A100
⭐ 13 🌐 Public
Gemini-Image-Studio
A state-of-the-art image generation and editing tool powered by Google's Generative AI models. This React-based web application allows users to generate images from text prompts, edit existing images, or create images from hand-drawn sketches.
⭐ 3 🌐 Public
Gemini-Image-Studio-HF
A state-of-the-art image generation and editing tool powered by Google's Generative AI models. This React-based web application allows users to generate images from text prompts, edit existing images, or create images from hand-drawn sketches.
⭐ 4 🌐 Public
Gemma-3-Multimodal
Gemma 3 [ Image-text-text ] [ video inference ] [ multi image chat ]
⭐ 9 🌐 Public
Gen-Vision
Multiple Conditioned Image Generation, SDXL, Low-rank adaptation Refined
⭐ 4 🌐 Public
Gen-Vision-Multimodal
image, text, image-text-text, ocr
⭐ 2 🌐 Public
Gender-Classifier-Mini
Gender-Classifier-Mini is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images based on gender using the SiglipForImageClassification architecture.
⭐ 1 🌐 Public
Geometric-Shapes-Classification
Geometric-Shapes-Classification is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a multi-class shape recognition task. It classifies various geometric shapes using the SiglipForImageClassification architecture.
⭐ 1 🌐 Public
GiD-Land-Cover-Classification
GiD-Land-Cover-Classification is a multi-class image classification model based on google/siglip2-base-patch16-224, trained to detect land cover types in geographical or environmental imagery. This model can be used for urban planning, agriculture monitoring, and environmental analysis.
⭐ 1 🌐 Public
Gliese-CUA-Tool-Call-8B-Demo
A Gradio-based demonstration for the prithivMLmods/Gliese-CUA-Tool-Call-8B model, a Computer Use Agent (CUA) specialized in GUI understanding and tool-calling actions.
⭐ 1 🌐 Public
Gliese-CUA-Tool-Call-8B-Localization-Demo
A Gradio-based demonstration for the prithivMLmods/Gliese-CUA-Tool-Call-8B model, specialized in GUI element localization. Users upload UI screenshots, provide task instructions (e.g., "Click on the search bar"), and receive predicted click coordinates in Click(x, y) format.
⭐ 1 🌐 Public
GLM-4.1V-9B-Thinking-Video-Understanding
GLM-4.1V-9B-Thinking, designed to explore the upper limits of reasoning in vision-language models. By introducing a "thinking paradigm" and leveraging reinforcement learning, the model significantly enhances its capabilities.
⭐ 3 🌐 Public
Grab-Doc
Chat Response Documentation
⭐ 2 🌐 Public
Grab-Doc-V
MS Word Like Content Creation System
⭐ 2 🌐 Public
GRID-6X
Layout for Seamless Image Assembly
⭐ 1 🌐 Public
GWQ
Gemma with Questions
⭐ 1 🌐 Public
GwQ2B
gemma with questions
⭐ 1 🌐 Public
Gym-Workout-Classifier-SigLIP2
Gym-Workout-Classifier-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224
⭐ 1 🌐 Public
Hand-Gesture-2-Robot
Hand-Gesture-2-Robot is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to recognize hand gestures and map them to specific robot commands using the SiglipForImageClassification architecture.
⭐ 1 🌐 Public
Herculis-CUA-GUI-Actioner-4B-Demo
Demo: Herculis-CUA-GUI-Actioner-4B is a Computer Use Agent (CUA) multimodal model designed for GUI understanding, UI localization, and action execution across web, desktop, and mobile environments
⭐ 1 🌐 Public
HF-POSTS-RECEIPT
hf receipt, bs4, jinja2
⭐ 0 🌐 Public
Hindi-Sign-Language-Detection
Hindi-Sign-Language-Detection is a vision-language model fine-tuned from google/siglip2-base-patch16-224 for multi-class image classification. It is trained to detect and classify Hindi sign language hand gestures into corresponding Devanagari characters using the SiglipForImageClassification architecture.
⭐ 1 🌐 Public
Hospital-Management-System
Hospital Management System Using StreamLit Application
⭐ 10 🌐 Public
How-to-run-huggingface-spaces-on-local-machine-demo
Running Hugging Face Spaces on a local machine / colab T4 GPU involves several steps. Hugging Face Spaces is a platform to host machine learning demos and applications using Streamlit, Gradio, or other frameworks.
⭐ 21 🌐 Public
Huggingface-Android-Application
URL to App Conversion
⭐ 14 🌐 Public
Human-Action-Recognition
Human-Action-Recognition is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-class human action recognition. It uses the SiglipForImageClassification architecture to predict human activities from still images.
⭐ 3 🌐 Public
Human-vs-NonHuman-Detection
Human-vs-NonHuman-Detection is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images as either human or non-human using the SiglipForImageClassification architecture.
⭐ 1 🌐 Public
HunyuanOCR-Demo
A Gradio-based demonstration application for the Tencent HunyuanOCR model, focused on optical character recognition (OCR) tasks such as text detection, extraction, and coordinate formatting from images. Users can upload images, customize prompts (e.g., for Chinese/English text).
⭐ 2 🌐 Public
Image-Captioning-Salesforce-Blip
The BlipProcessor and BlipForConditionalGeneration are likely classes specific to a model called "Blip," which seems to be a transformer-based model for conditional text generation.
⭐ 7 🌐 Public
Image-Guard-2.0
No description
⭐ 0 🌐 Public